Strong Secrecy and Reliable Byzantine 



Detection in the Presence of an Untrusted 

Relay 

Xiang He Aylin Yener 
Wireless Communications and Networking Laboratory 
Electrical Engineering Department 
The Pennsylvania State University, University Park, PA 16802 
xxhll9@psu.edu yener@ee.psu.edu 
March 31, 2010 

Abstract 

We consider a Gaussian two-hop network where the source and the destination can communicate 
only via a relay node who is both an eavesdropper and a Byzantine adversary. Both the source and the 
destination nodes are allowed to transmit, and the relay receives a superposition of their transmitted 
signals. We propose a new coding scheme that satisfies two requirements simultaneously: the transmitted 
message must be kept secret from the relay node, and the destination must be able to detect any Byzantine 
attack that the relay node might launch reliably and fast. The three main components of the scheme 
are the nested lattice code, the privacy amplification and the algebraic manipulation detection (AMD) 
code. Specifically, for the Gaussian two-hop network, we show that lattice coding can successfully pair 
with AMD codes enabling its first application to a noisy channel model. We prove, using this new 
coding scheme, that the probability that the Byzantine attack goes undetected decreases exponentially 
fast with respect to the number of channel uses, while the loss in the secrecy rate, compared to the rate 
achievable when the relay is honest, can be made arbitrarily small. In addition, in contrast with prior 
work in Gaussian channels, the notion of secrecy provided here is strong secrecy. 
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I. Introduction 

Information theoretic secrecy, first proposed by Shannon [1], provides confidentiality of trans- 
mitted information against an adversary regardless of its computational power. Shannon proved 
that if the adversary has access to the signals transmitted by the sender of the secret message 
through a noiseless channel, then, to achieve perfect secrecy from the adversary, the sender and 
the receiver has to share a secret key of the same length as the message. Although Shannon's 
result implied that secret communication was impractical in this setting, it was later shown 
by Wyner that this pessimistic result was a consequence of the noiseless channel assump- 
tion. Specifically, it was shown that when the adversary has noisy observations of the signals 
transmitted by the sender, a nonzero transmission rate for the secrecy message is achievable 
without requiring the transmitter to pre-share a key with the receiver More recently, the 

fundamental rate limits at which the secret communication can take place in the presence of an 
eavesdropper were studied for a number of multi-terminal models, e.g., the broadcast channel 0, 
[0, the two-way channel 0, [H, the multiple access channel and the interference channel 

M, lIlQl- 
Secure communication for channel models with a relay node has been studied from a variety 

of perspectives, including the relay node as a helper to the legitimate communication link lUTI . 
or to an eavesdropper |[T2l|. References lfT3l - lfT6l consider the case where the relay node itself 
is the eavesdropper from whom the information transmitted from the source to the destination 
must be kept secret. This setting, which provides theoretical foundations toward the utilization of 
untrusted relay nodes in network design, is relevant in practice: The potentially untrusted routers 
of today's Internet routinely relay sensitive information for its users. The current approach is 
that the authenticity and secrecy of the information is protected by security protocols assuming 
these routers are limited in computational power ifTTl . It is interesting to address the role of 
these routers if they are computational power unlimited adversaries. 

To answer this question, in 0, lfl4l . |fT5l . as a first step, we considered the case where the 
relay node was "honest but curious". This means that the curious relay node is not trusted with 
confidential messages. On the other hand, it is honest, and thus conforms to the system rules 
and performs the designated relaying scheme. Reference lfl4l considered the three-node relay 
network with such a relay. References 0, [fT5l considered the two-way relay channel where two 
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nodes could only communicate through such a relay node. In these works, we showed that if the 
relay was not trusted but honest, recruiting it to help relay information was useful in achieving 
a higher secrecy rate than simply treating the relay node as an eavesdropper. This effect is most 
pronounced in the two-hop model studied in |fT5l , in which the achievable rate is if the relay 
node is excluded from communication, and increases to being within lbit of the rate of having 
trusted relay if the untrusted relay node is properly utilized. Similar observations can be made 
in networks with multiple confidential messages lfl6ll . 

It is the next natural step to consider the problem where the relay node is curious and is 
potentially dishonest. This means that the relay can deviate from its designated behavior. This 
can be as benign as the relay node experiencing a failure and stopping transmission, which is 
obviously easy to detect. However, if the relay is a malicious entity (or is captured by one), 
a more detrimental scenario can materialize. Specifically, the relay can attempt to deceive the 
destination into accepting a counterfeit message by actively manipulating the signals it relays. 
Such behavior is a "Byzantine attack" lfT8l . When the adversary is limited in computational 
power, this type of attack can be detected via message authentication code or digital signatures 
ifTTl . The security guarantee promised by these schemes is essentially based on the absence of 
known effective attack strategies and the fact that their reliability can be proved if a very small 
set of assumptions is made. 

In this work, we tackle the case where the Byzantine adversary has unlimited computational 
power. In an effort to demonstrate the simplest network which relies on an untrusted node 
to communicate, we consider a two-hop network lfT51 . In contrast to reference lTT3Tl . which 
considered an honest but curious relay, we allow the relay node to actively modify the transmitted 
signal in any way it desires. The goal of the destination thus becomes detecting the message 
that has been altered fast and reliably whenever the relay node chooses to do so. 

Toward accomplishing this goal, there are several known results that can be leveraged, each 
with their own limitations. For example, Byzantine attack detection can be viewed as an au- 
thentication problem, by treating the counterfeit message W as a message from a "wrong" 
source node. An information theoretic secrecy scheme with an authentication capability was 
proposed in [19]. However, like other message authentication codes [201, the source has to share 
an authentication key with the destination beforehand. 

It is known, on the other hand, that to detect the Byzantine attack, which is a milder require- 
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ment than authentication, it is not essential to share keys. In reference 11211 . the so-called algebraic 
manipulation detection (AMD) code was used for encoding the data from the source node which 
ensures the probability that the Byzantine attack succeeds can be made arbitrarily small with 
an arbitrarily small loss in rate. A limitation of this scheme is that it has to be used along with 
a secrecy sharing scheme that has certain linearity property 11211 . which is easily fulfilled in a 
noiseless network as shown in |[T8ll . ||2"2"1|. Indeed, in jl22|. we considered a deterministic two-hop 
network and it was shown that by using AMD code, the probability that the Byzantine adversary 
wins decreases exponentially fast with respect to the total number of channel uses n' while the 
loss in rate can be made arbitrarily small. On the other hand, for noisy channels, secret sharing 
schemes generally fail to have the required linearity property. As a result, to date the strongest 
result that could have been obtained is that, for a noisy two-hop network, the probability that a 
Byzantine attack goes undetected decreases exponentially only with respect to \fn' in [|22l . 

The main contribution of this work is to demonstrate that for the Gaussian two-hop network, 
the probability that a Byzantine attack goes undetected, i.e., the adversary wins, also decreases 
exponentially fast with respect to n', while the loss in secrecy rate can be made arbitrarily 
small. Hence, the same result achievable for the deterministic two-hop network is attainable for 
this noisy two-hop network. This represents a departure from traditional security approaches 
that assume a noiseless bit pipe for communication and brings the physical characteristics of 
the channel into the picture while providing a guarantee thought to be possible only with the 
noiseless setting. The key to prove this result is the introduction of a new strong secrecy scheme. 
Its existence is proved via the representation theorem derived in [TTOl . Il23l and the privacy 
amplification technique presented in [|24|. Il25l . Compared to previously known strong secrecy 
schemes, the main differences are: 

1) Unlike the randomly generated codes in 11261 . the decoder of the new scheme is linear for 
certain rate configurations. 

2) Unlike [[Toll . 11231 . the codeword consists of a single lattice point rather than multiple 
lattice points. This allows the mutual information between the confidential message and 
eavesdropper's observation to decrease exponentially with respect to n' . Hence the notion 
of secrecy provided by this scheme is stronger than commonly used strong secrecy scheme, 
which only requires this mutual information to vanish with respect to n' . 
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The first item provides the linear property required by AMD code. The stronger-than-usual 
secrecy notion in the second item is essential in preserving the Byzantine detection performance 
offered by AMD code. As will be shown in Section [VjJ the commonly used strong secrecy 
notion, as in ||25l , 11271 , is insufficient for this purpose. 

There is other work in Byzantine detection from which this work differs. Notably, reference 
ll28l proposed to use the sender of the confidential message to monitor the behavior of the relay 
node. This so-called "watchdog" scheme could also have been used in the setting we consider if 
the message in transmission were not to be kept secret from the relay node. However, when the 
message is confidential, using a "watchdog" is not possible. This is because there is no direct link 
between the two legitimate communicating nodes which means the sender has no information 
regarding the signals transmitted by the destination. As will be explained in Section [IV] these 
signals are necessary in order to deploy cooperative jamming to keep the message secret 
from the relay node, see also |[T5l . Since the received signals at the relay is garbled by signals 
transmitted by the destination, so are the signals transmitted from it. This prevents the source 
from detecting whether the relay misbehaves by just looking at its transmitted signals without 
the knowledge of the signals transmitted from the destination. 

This work should also be differentiated from references [|29ll - [[32ll . In these works, the adver- 
saries can also actively manipulate the signals received by the destination. However, the purpose 
is to find a way for reliable communication in the presence of such adversaries carrying out 
the worst-case attack. In the two-hop network considered in this work, this is not possible since 
there is no direct link between the two legitimate communicating nodes. Hence, when Byzantine 
behavior is detected, we need to forgo the relay. 

The remainder of the paper is organized as follows: In Section [TTl we describe the system 
model and formulate the Byzantine detection and secrecy problem. In Section Unl we review 
known Byzantine detection schemes, in particular, the AMD code and describe the technical 
obstacles to be overcome in this work. Section IrWVTl describe the main components of strongly 
secure scheme proposed in this work and how it can be combined with AMD codes for Byzantine 
detection purpose. Section IVIII concludes the paper. 
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Fig. 1. The Gaussian two-hop network. Phase 1 is indicated by solid line, and phase 2 by dashed line. R/E: Relay/Eavesdropper. 
Yi is not shown. 

II. System Model and Problem Formulation 

The Gaussian two-hop network with a Byzantine relay node is shown in Figure [Q In this 
model, node 1 wants to send a confidential message W to node 2. Since it can not communicate 
with node 2 directly, it recruits the help of a relay node, who is not trusted with the message 
W. The signal received by the relay node consists of the signals transmitted by both node 1 and 
2, and the signal broadcasted by the relay node is heard by both nodes as well. These are fitting 
assumptions for wireless communication. Let Xi,i = 1,2, X r denote the signal transmitted by 
node 1, 2 and the relay. Let % = 1, 2 and Y r denote their received signals respectively. After 
normalizing the channel gains, we have 

Y r =X 1 +X 2 + Z T (1) 
Y 2 = X r + Z R , Y 1 = hX r + Z' R (2) 

where Z r , Z R and Z' R are independent Gaussian random variables with zero mean and unit 
variance, h is the normalized channel gain. Since Y\ is not used in the scheme described in this 
work, it is omitted in Figure Q] for clarity. We assume each node is half-duplex. For simplicity, 
we assume the relay node transmits during half of all channel uses. Without loss of generality, 
we assume node 1 and 2 do not transmit when the relay node transmits since the relay node 
can not receive and relay their transmitted signals simultaneously. We also assume during the n 
channel uses that the relay node transmits, its transmission power averaged over these channel 
uses should not exceed P. During the remaining n channel uses that node 1 and 2 may transmit, 
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the transmission power of each of these two nodes averaged over these channel uses should not 
exceed P. 

We assume the Byzantine adversary at the relay node can employ any stochastic function to 
compute its current transmitted signal. Let X r ti be its transmitted signal at the ith channel use. 
Let M r be the local randomness available to the relay node. Let Y*~ l be the signals it received 
in the past. Let W be the confidential message it is currently relaying. Let fa be the relaying 
function. Then the attacker (relay) can compute: 



It might seem inconsistent at first glance to assume the Byzantine adversary knows the message, 
which should be kept secret from the relay node in the first place. However, when the possible 
choice for W are limited, for example, to being binary, the attacker has a non-negligible 
probability of success for guessing it. This can also happen when the channel is used to transmit 
data with high redundancy and stringent latency requirement, so that adjacent messages are 
highly likely to share the same value. If, somehow, the adversary has access to earlier messages, 
it can guess the value of the current message with high probability of success. As a result, it is a 
common practice to design a reliable message authentication scheme by assuming the adversary 
knows the message EOl Definition 4.2]. Here too, we follow this convention. 

The Byzantine detection problem for secure communication using an untrusted relay can be 
stated as follows: 

Let the total number of channel uses be n' = 2n, during which each node transmits during n 
channel uses. Let W be the estimate of W computed by the destination, i.e., node 2, based on 
its observation. Note that because the relay can be a Byzantine adversary, node 2 may or may 
not accept W as a genuine message from node 1 based on certain criteria. 

Definition 1: lEUl A function of n, 7„ is negligible if for any polynomial of n with a finite 
degree poly(n), we have: 



x r , = f z (M r ,Y;-\w) 



(3) 



lim o poly(n)7 n = 



(4) 



□ 



We wish to find the secrecy rate R e of W, defined as 



R e = lim —H (W) 

n— >oo T)' 



(5) 
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such that the following conditions hold: 

1) When the relay node is honest, and W is uniformly distributed over the message set, then 
both Pr (W ^ W) and 

Pr (W is not accepted by Node 2\W = W) (6) 

should be negligible as per Definition [IJ Hence, the transmission of W is reliable. 

2) For Vu> in the message set, the probability that the adversary wins, Pr(A wins), given 
by 

Pr(A wins) = Pr (w is accepted by Node 2\W = w ,W ^ w) (7) 

is negligible. Hence any modification on W is detected reliably. 

3) / (W; Y™) is negligible. Since F r n is the observation of the eavesdropper, this means the 
information that the adversary has regarding the value of W is negligible. 

Remark 1: Observe that the condition of reliable Byzantine detection in [2]) is independent 
from the distribution of W. □ 

III. Known Byzantine Detection Schemes 

As mentioned in the introduction, when there are no secrecy concerns at the relay, whether 
the relay is honest or not can be checked by the source node, i.e., node 1, by examining Y\. 
However, since there are secrecy constraints in our model, applying sender-based Byzantine 
detection approach is not feasible. Therefore, we will concentrate on a receiver-based approach 
called algebraic manipulation detection (AMD) code in the sequel. 

AMD code was formally defined in 11211 . An AMD codeword is composed of three parts: 
{s, x, h}, where s is the d X 1 vector on QJ r (q r ) representing the message. The component x is 
called the random seed and is generated from QJ r {q r ) by the encoder itself, h is the hash tag 
and is computed according to the hash rule: 

d 

h = x d+2 + ^ (8) 

8=1 

where s, is the ith component of s and the addition and multiplication is defined over QJ-{q r ). 
Suppose the node 2 receives s' , x', h', where s' ^ s. Let A x = x' — x. A h = h' — h. Then ETI 
has the following result: 
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Theorem 1: ET1 Theorem 2] Assume at least one of s' — s, A x , A h is not zero. If the 
distribution of x conditioned on {A x ,A h ,s',s} is uniform over the field QJ r (q r ), q being a 
prime, and d + 2 is not divisible by q, then the probability that the hash rule © holds for 
{s',x', h'} is bounded by ^±1. 

Remark 2: The rate of the AMD code is The rate can be made arbitrarily close to 1 by 
choosing a large enough value for d. 

On the other hand, an AMD codeword can be represented by less than (d + 2)rlog 2 g + 1 
bits. Hence, if we fix d and q, the codeword length is a linear function of r. Consequently, for 
a given code rate, the probability that {s',x',h'} can pass the hash rule check ® decreases 
exponentially fast with respect to the codeword length. □ 

Despite the excellent performance of the AMD code, applying it in a noisy channel is 
difficult. This is exemplified by the condition in Theorem [[} The distribution of x conditioned 
on {A x , A h , s' , s} must be uniform over the field QT(q r ). In a noisy channel, in general, A x 
and x are not independent. In the two-hop network considered in this work, this can be seen 
from the expression of A x . Let g be the decoding function used by node 2. Let F 2 ™ be the signal 
received by node 2 if relay is honest. Otherwise, we denote it with Y" 2 n . Assuming the decoding 
result is correct at all nodes if the relay is honest. In this case, A x is given by: 



By observing (flOl) . we notice the condition in Theorem Q] can be fulfilled if g is linear in its 
first parameter and Y" 2 n — F 2 n is independent from x. In general, g is not linear. Even if this is 
the case, it is also difficult to achieve independence between Y" 2 n — Y£ and x. Since both Y£ 
and F 2 n are signals transmitted by the relay corrupted by the channel noise, the joint distribution 
of Y£ — Y£ and x can be made close to an independent distribution if the relay node has 
negligible information regarding the value of x. But it remains to see whether the performance 
guarantee in Theorem Q] can be preserved when Y£ — Y£ and x are almost independent rather 
than truly independent. In the sequel, we will propose a strong secrecy scheme that overcomes 
these problems. 



A x 



= x — x 



(9) 




(10) 
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IV. Lattice Coding Scheme 

We first briefly review the communication scheme when the relay is "honest but curious", on 
top of which we will build the strong secrecy scheme and the Byzantine detection scheme in 
the sequel. 

Since each node is half-duplex, naturally we have a two-phase scheme. In phase one, nodes 1 
and 2 transmit, and the relay node receives. In phase two, the relay transmits. For simplicity, we 
assume that each phase occupies the same number of channel uses. It was shown in [15J that 
these two phases can be used to facilitate the transmission of the confidential message W from 
node 1 to 2: The channel alternates between phase one and phase two. During phase one, node 
1 transmits the confidential message via X\ and at the same time node 2 sends a signal X 2 to 
jam the relay node. During phase two, the relay node transmits to node 2 based on the signal it 
received during phase one. Since node 2 knows X 2 , it can subtract it to obtain a clean signal. 
The relay node, however, does not know X 2 and hence can only observe a noisy version of X\. 
Intuitively, this means node 1 can transmit to node 2 at a rate higher than the relay node can 
decode, and that this excess rate can be used to convey confidential messages. This idea was 
formalized in [fT5l using compress-and-forward relaying and in ll23l using compute-and-forward 
relaying. In this work, we focus on the compute-and-forward scheme as it offers the algebraic 
structure that facilitates detection of a Byzantine attack. 

In the compute-and-forward scheme, the signals transmitted by the two legitimate nodes are 
taken from the same nested lattice codebook. This scheme was first proposed in ll33l for a 
Gaussian two-way relay channel without eavesdroppers. Later, the scheme was used in ll23l as 
a building block to transmit confidential messages when the relay is honest but curious, i.e., is 
an eavesdropper but not a Byzantine adversary. The lattice coding scheme is described next for 
completeness: 

We begin by introducing basic notations for the nested lattice structure: For a lattice A c , the 
modulus operation x mod A c is defined as x mod A c = x — argmin i€ A c d(x, t), where d(x, t) is 
the Euclidean distance between x and t. The fundamental region of a lattice V(A C ) is defined as 
the set {x : x mod A c = x}. A pair of ^-dimensional lattices {A, A c } is said to have a nested 
structure if A c C A ll34l . 

Now consider a pair of A r -dimensional nested lattice pair {A, A c } which is properly designed 
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as in 11341 . The signal transmitted by each node is given by 

X? = (*? + d ? ) mod Ac, « = 1, 2 (11) 

where tfeAD V(A C ), and df ,% = 1,2 are two fixed vectors in V (A c ) and are known by 
the relay node. For our purpose, tf will be computed from the confidential message. tf is 
independent from tf and is chosen from A fl V (A c ) according to a uniform distribution. As a 
result, Xf = tf + df mod A c serves as the jamming signal to confuse the untrusted relay node. 

An honest relay node will then decode tf + tf mod A c and transmit tf + tf + df mod A c 
during phase two, where df is a fixed vector in V (A c ) and is known by node 2. Node 2 then 
decodes t N — tf + tf mod A c from the signal it received during phase two. An estimate of tf, 
denoted by tf, is then by computed from i N — tf mod A c . 

Define |«S| be the cardinality of a set S. Define R as 

J R = ^log 2 |AnV(A c )| (12) 

Then it was shown in ll33l that, if 

J R <^log 2 (i + P) (13) 

the probability Pr(t^ ^ ty) decreases exponentially with respect to N . 

Remark 3: It is clear that if the relay chooses to transmit + d% mod A c for some arbitrary 
t% G An V (A c ), then node 2 will be forced to accept a message that is not originated from 
node 1. This shows that unless some proper measure is taken, Byzantine attack can quite easily 
succeed in this scenario. □ 

Remark 4: df, i = 1, 2, 3 are conventionally defined as random variables uniformly distributed 
over V(A C ) II341 . The reason of defining them to be random is that it is easier to analyze the 
average error performance of an ensemble of lattice code books parameterized by the dithering 
vectors than to analyze the error performance of a specific lattice code book 11351 . However, 
from the result on the average performance, we can also claim that there must exist some 
fixed df,i = 1,2,3, which corresponds to fixed lattice codebooks in the ensemble, and these 
df,i = 1,2,3 also provide vanishing error probability and meet the average power constraints 
ifTOl . Hence in the sequel we assume df, i = 1,2,3 are fixed. □ 
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(tf + <) mod A c + 
+ <) mod A c + 

Fig. 2. The lattice input Wiretap Channel 

V. Using Nested Lattice Codes to Provide Strong Secrecy 

For the lattice coding scheme described in Section [IV] the two-hop network is equivalent 
to the lattice input wiretap channel shown in Figure [2l The main channel takes input tf G 
A fl V(A C ), and produces output if . The eavesdropper channel also takes input tf , and has the 
same observation as the signals received by the relay node in the two-hop network. The only 
difference from the original two-hop network is that in the two-hop network, it takes another 
N channel uses for the relay to relay the lattice point to node 2 during which node 1 and 2 do 
not transmit. Here, to simplify the argument, we omit this detail and will take these additional 
channel uses into account when we revisit the two-hop network in Section [Vj] Here, we simply 
assume that in the lattice-input wiretap channel, the transmitter transmits in each channel use 
and its average power constraint is given by P. 

In the sequel, we will design a coding scheme for the lattice-input wiretap channel to transmit 
a confidential message W reliably such that the following strong secrecy condition holds: 

I(W; Y r N ) < exp(-aiV), a>0 (14) 
A sufficient condition for (fl4l) to hold is: 

I(W; Y r N ) < exp(-aiV), a>0 (15) 
where Y r N is obtained by subtracting the channel noise from Y r N : 

Y T N = (tf + <) mod A c + (t% + d?) mod A c (16) 

A. Strongly Secure Scheme 




Main Channel 



Eavesdropper's 
Channel 
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1) When A c = qAfor a prime q: The self-similar nested lattice code with prime nesting ratio, 
i.e., A c = qA, is a special case of the good nested lattice ensemble proposed in ll34l Section 7]. 
We first consider this case since when q is a prime, the set (A + d N ) n V (A c ) is isomorphic to 
a finite field, as shown by the following lemma: 

Lemma 1: When A c = qA for a prime q and the generation matrix of A has full rank, 
(A + d N ) PI V (A c ) , for the modulus-A c plus operation, is isomorphic to the group of a finite 
field GJ r (q N ). 

Proof: The proof is provided in Appendix [A] ■ 
Remark 5: The isomorphism in Lemma[[]is not affected by the choice of d. The fixed dithering 
vector d is simply used to control the average power of the lattice code book. □ 
As we will show later in the proof of Theorem |2l the isomorphism property proved by Lemma Q] 
allows the resulting decoder to be linear and proves to be of critical importance in the Byzantine 
detection scheme in Section fVTl 

The next theorem declares the existence of the strong secrecy scheme. 
Theorem 2: For a given constant e > that can be arbitrarily small, assume q is a prime 
large enough such that 



i-i±i>o (17) 



Then for an integer r, such that 



< r < AM 1 - (18) 
V log 2 <?/ 

there exists a linear mapping g from QJ z {q) N to QJ-{q) r such that 

1) g has full row rank r. 

2) When tf, i — 1, 2 are uniformly distributed over (A + df) D V (A c ) and are independent 
of each other, there exists a positive constant (3 such that 

l(B{t?);Y r N )<2e-W (19) 

Before proving the theorem, we need several supporting results: 
First, the following representation theorem from ll23l is useful: 



2 

Theorem 3: [23] For any ui, u 2 , such that m G V (A c ) , i — 1, 2, u k is uniquely determined 

k=l 

2 

by {T, u k mod A c }, where T is an integer such that 1 < T < 2^. 

k=l 
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Based on Theorem [31 Y r N in ([TBI can be represented by {(Yd =1 (tf + df)) mod A c , T}. Since 
df ,i = 1,2 are known by each node, this means Y r N in (fT6l) can be represented by {(tf + 
t^) mod A C ,T}. 

We also need the following result which says most matrices have full rank: 
Lemma 2: Let G be taken from the set of linear mappings from QJ z {q) N to QJ r (q) r according 
to a uniform distribution. Hence G can be represented as a matrix over QJ z {q) with r rows and 
iV columns. The probability that G has full row rank is greater than 1 — q r ~ N . 

Proof: Let g h i = 1, r be the ith row of G. Then G does not have full row rank if and 
only if 

a-i9i + a-292 + ■■■ + a r 9r = 0, a, G QT{q) (20) 

Since at least one a» has to be non-zero, there are q r — 1 possible choices for a,. 

For each choice of {c^}, since one is not zero, there are g^ -1 ) solutions for Hence 
there are at most q N ( r ~ 1 \q r — 1) Gs that do not have full row rank. There are q Nr possible Gs 
in all, each chosen with equal probability. Hence the probability that G does not have full row 
rank smaller than q r ~ N , and we have Lemma [2] ■ 

Finally, we need the following results on privacy amplification []24|. which we state here for 
completeness: We begin with a couple of useful definitions: 

Definition 2: For a discrete random variable X, the Renyi entropy H 2 (X) is defined as 

# 2 (X) = -log 2 ]TPr(X = a;) 2 (21) 
The Shannon entropy H(X) is defined as 

H(X) = - Pr(X = x) log 2 Pr(X = x) (22) 

X 

Definition 3: j|24l Definition 1] A set of functions A — > B is a class of universal hash function 
if for a function g taken from the set according to a uniform distribution, and x\,x 2 G A, x\ ^ x 2 , 
the probability that g(xi) = g(x 2 ) holds is at most 1/|£>|. 
We next state the results based on these definitions: 

Lemma 3: [|24| The set of linear mapping as defined in Lemma [2] is a class of universal hash 
function. 



15 



Theorem 4: 11241 Corollary 4] Let G be selected according to a uniform distribution from a 
class of universal hash function from A to QJ r ( y q) r . For two random variables A, B, A being 
defined over A, if for a constant c, H 2 (A\B = b) > c, then 

H(G(A)\G,B = b)>r\og 2 q- 2rl °^ ° (23) 

With these preparations, we are now ready to prove Theorem [2} 

Proof of Theorem\2\- Define a®b as a + b mod A c . Then for the distribution for tf ,i = 1,2 
stated in Theorem [2l tf © t 2 is independent from ty. Therefore we have: 

H 2 (*f |tf © = t") = # 2 = iVlog 2 q (24) 

Let T be the integer defined in Theorem [3] Then according to ll36l P 106, Theorem 5.2] E51 
Lemma 3], for a given integer a, 1 < a < 2 N and t N G An V(A C ), with probability 1 - 2~ (s/2 ~ 1) : 

H 2 (if |if ®t 2 —t N ,T — aj > H 2 (if |tf © if = t N ) - log 2 \T\ - s (25) 

=jV(log 3 <z-l)-a (26) 

In Lemma[Q we have shown that if A c = gA, with q being prime, then AflV(A c ) is isomorphic 
to QJ z {q N ). The isomorphism is with respect to the addition operation defined in these two sets. 
Since if G A n V(A C ), we can write if G QJ r (q N ). Moreover, since GF(q N ) is isomorphic 
to GJ r (q) N in terms of the addition operation defined in these two sets, we can further write 
if G QJ z (q) N . Let G be taken from the set of linear mappings from QJ r (q) N to QT{qY according 
to a uniform distribution. Then G(tf ) is well defined. 

According to Lemma [3j G is a universal hash function. Hence, according to Theorem HI we 
have: 

H (G (if ) |G, if © if = t N , T = a)> r\og 2 q - - (27) 

where c is given by (1261) : 

c = N(log 2 g-l)-s (28) 

Since depending on the value of t N and a equation (|26|) holds with probability 1 — 2 _(s / 2 ~ 1 ), 
from (|27|) . we have 

(G (if) |G, if © if, T) > (l - 2^/ 2 ^) (r log 2 q - ^f) (29) 
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Note that 

if(G(tf)|G)<rlog 2 <z (30) 

Hence in order for J(G(tf);tf © t$,T\G) to be negligible, we expect 2'^ 2 ^ and 2 rlog ?- c 
to decrease exponentially with respect to N. To achieve this, we choose s = e'N, where < 
e' < log 2 q — 1 so that c in (1281) is positive. We choose r such that for 5 > 0: 

rlog 2 g<c — iV5 (31) 
= iV(log 2 g- 1) -s-N5 (32) 
= iV(bg 2 g-l-e , -<5) (33) 

We observe that if (|3T|)-(|33l) are satisfied, 2 rlog i~ c to decrease exponentially with respect to N. 
We also observe that if we let e = e' + 5, then d2B-d33]) lead to (fl8l> . 

For these choices of r and s, from (|29| ) and (|3Q|) . we observe that there exists /3 > 0, such 
that 

/(G(tf);tf ©t^T|G) <e~ pN (34) 

We next use the fact that for sufficiently large N, most Gs have full row rank as shown in 
Lemma [2l Therefore, for a uniform distribution for tf,i = 1,2, t± and t% being independent, 
there must exists a G = g, such that 

1) g has full rank. 

2) From Markov inequality, 

/ (G (tf ) ; ^ © T|G = g) < 2e"^ (35) 

Finally, we use Theorem |3l which says tf © t% , T in (1351) can be replaced by Y r N . Hence we 
have proved Theorem [2l ■ 

The secrecy generation scheme described above will not be useful if the generated random 
variable, g(tf'), can not serve as the random seed, x, in the AMD tuple as described in Section Hill 
Hence we need the following lemma on the distribution of g(t^). 

Lemma 4: If tf is uniformly distributed over GJ r (q N ), and g has full row rank, Then g(tf ) 
is uniformly distributed over QJ z (q r ). 
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Proof: Since g has full row rank, and its elements are taken from the field QF(q), it can 
always be represented as 

g=[I,P]0 (36) 

where O is an N x N invertible matrix. Hence 0(tf) is uniformly distributed over QJ r (q N ). I is 
an r x r identity matrix. Since the sum of any two independent field elements will be uniformly 
distributed if one of the field element is uniformly distributed, it can be verified that g(tf) is 
uniformly distributed over QT{(f). ■ 

2) The General Case: When (A, A c ) does not have the self-similar relationship as described 
in Section IV-All we can still extract a strongly secure random variable from a lattice point using 
the same method as shown in Section IV-All The only difference is that the map between the 
extracted random variable and the lattice point will not be linear. 

Consider a general N dimensional nested lattice codebook A fl V(A C ). Recall that R , as 
defined in (fT2l) . is the rate of the codebook. Assume R® > 1. Let [x\ be the operation that 
rounds x to the nearest integer less than or equal to x. Define Nq as 

iV = Llog 2 |AnV(A c )|J (37) 

Then 

N > NR - 1 (38) 

Choose the subset K of the codebook (A+df)r\V (A c ) that yields the minimal average decoding 
error probability with the lattice decoder and has size \K\ = 2 N °. Define v as the one-to-one 
mapping from K to QJ r {2 N °). Then we have the following theorem: 
Theorem 5: Let e > be a constant such that 

R -l-e>0 (39) 

Then for an integer r , such that 

< r < A^o - 1 - e) (40) 

there exists a linear mapping g from 0^(2)^ to QJ z {2) r ° such that 
1) g has full row rank r . 
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2) When t± is uniformly distributed over K, t% is uniformly distributed over (A+d^ )flV (A c ), 



£f , are independent of each other, we have 



i (g ; 



< 2e 



-0N 



(41) 



for a certain /3 > 0. 

Proof: The proof is similar to that of Theorem |2l and is given in Appendix |Bj ■ 
3) Encoder Construction: Although both Theorem [2] and Theorem [5] can be used to prove the 
existence of an encoder with rate arbitrarily close to max{i? — 1, 0}, with R defined in (fT2l) . 
only Theorem [5] is used in the sequel to transmit confidential messages. Theorem [2] is only used 
to generate strongly secure random seeds, for which Theorem [2] is sufficient by itself. Hence in 
this section, we discuss Theorem \5\ only. The argument we use is as follows: 

For a given g that has full row rank, let g' be (A — r ) x N matrix such that 

square matrix that is invertible. Define S and S' such that 



is a 



■>(N -r )xN 
lr xN 



Then S = g(v(tf )). Define A as the inverse of 



S (7V -r )xl 
S ro xl 



, then the encoder is given by: 



(42) 



S (JV -ro)xl 
S rn x 1 



(43) 



where S G QJ r (2 ro ) be the input to the encoder. We assume S is uniformly distributed over 
^J r (2 r °). tf G A n V(A C ) is the output of the encoder. S' represents the randomness in the 
encoding scheme. We observe that, if {S( A r „ ro - )xl , S roX i} is uniformly distributed over QJ Z (2) N ° 
and (|43l is used as the encoder, tf is also uniformly distributed over the set K. Since G = g 
is chosen when t± has a uniform distribution over K, this means that when (l43l) is used as an 
encoder, the secrecy constraint in Theorem |5J (I4TT) . still holds. 

Since the encoder (|43l uses channel uses to transmit a r x 1 binary vector, the achieved 
secrecy rate is 



Re — [Rq — 1 — e] 



(44) 
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where [x] + equals x if x > or otherwise. According to (TT3T ), this means R e can be arbitrarily 
close to 



ilog 2 (i + P)-l 



+ 



(45) 



B. Comparison with Other Wiretap Coding Schemes 

Although this work leverages the same technique, namely, privacy amplification as j|25ll . it is 
distinct from 11251 in the following aspects: 

Reference ll25l proposed that one can invoke any weakly secure scheme multiple times and 
extract a strongly secure key using privacy amplification. Let Q(x) denote the set of functions 
ax + b, a > 0, b ^ 0, and a, b are constants. In our model, each invocation of the weakly secure 
scheme involves 6(iV) channel uses, where N is the dimension of the lattice code. Suppose this 
scheme is invoked for M times. Then the total number of channel uses is MN. Let K denote 
the generated key and Y r MN be the signals observed by the eavesdropper, then the result in ||25l 
implies Q 

lim / f K; Y r MN ) = (46) 

In this work, g(t^) in Theorem [2] can be viewed as the strongly secure key. Based on Theorem [2l 
we have 



i i SL-^ I ( K ' Y - N ) >0 



(47) 

Comparing (PTTI) to (l46l) . we observe (PTTl) is stronger. This is because the strongly secure scheme 
in Section IV-AI leverages results specific to nested lattice code, namely Theorem [3] and extracts 
the key from a single lattice point instead of a sequence of lattice points. Hence, while the 
scheme we proposed in Section IV-AI is not as generally applicable as G51 does, we observe that 
it performs better than applying G51 directly to our model. 



'To simplify the argument, we have omitted several details from 1251 including "error reconciliation". Interested readers can 
refer to 1251 for further details. 
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VI. Byzantine Detection 

In this section, we describe how to transmit the AMD code using the strong secrecy scheme 
proposed in Section |V] and analyze its performance. 

To transmit {x, h}, we use the idea of "message authentication codes with key manipulation 
security" in [21, Section 4]. Note that for a given s, the distribution of hash tag h is in general 
not uniform. Hence the distribution of h depends on the distribution of s. However, if we want to 
use the strongly secure scheme in Section IV-AI to transmit h and desire to fix the hash function 
G = g, we need to know the distribution of h beforehand, which is difficult since the distribution 
of s is hard to determine beforehand. To solve this problem, we introduce another random seed 
k from QJ r (q r ), which can be generated via the linear coding scheme in Section IV-AI From 
Lemma |H k is uniformly distributed over QJ r (q r ). Hence h can be transmitted by using A; as a 
one time pad. 

The transmission is hence divided into 4 stages: 

1) x E G^iq 7 ') is extracted from an N dimensional lattice code as shown in Section IV-All 

2) k G GJ r (q r ) is extracted from an N dimensional lattice code as shown in Section IV-All 
Let k be the estimate of it computed by node 2. Let Pi be the average power per channel 
use of the N dimensional lattice code. 

3) u = h © k is transmitted by node 1 via the conventional two-hop protocol using r- 
dimensional lattice code with log 2 q per channel use. In this stage, node 2 remains silent. 
Let u be the estimate of it computed by node 2. Let P 2 be the power per channel use of 
the r dimensional lattice code. 

4) s is transmitted via the encoder described in Section IV-A2I with P = P(l — ep). Ep is a 
positive constant that can be made arbitrarily small. Let s be the estimate of s computed 
by node 2, which corresponds to s' in Theorem [TJ 

Remark 6: Note that both Pi and P 2 are only functions of the rate of their respective lattice 
code, which is log 2 q. Hence Pi and P 2 are only functions of q. Therefore, we can increase r, 
while leaving Pi,P 2 unchanged. □ 

We next derive the following important lemma which implies the condition of AMD code 
stated in Theorem Q] can be fulfilled using the transmission scheme described above: 
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Lemma 5: Let s be any d x 1 vector on QJ r {q r ). Then 

J(x;A x ,A h ,s|s = s ) <4exp(-/3iV) (48) 

where /3 is a positive number defined in Theorem [2] 

Proof: The proof of Lemma [5] is based on the strong secrecy offered by Theorem |2] and 
Theorem [51 and is provided in Appendix O ■ 
Remark 7: Lemma |5j implies that 

I{x;A x ,A h ,s\s) <4exp(-/3A0 (49) 

Since I(x; s) = 0, this means 

I (x; A x , A ft , s, s) < 4 exp(-/3iV) (50) 

□ 

Remark 8: Note that / (x; A x , A^, s\s = s ) does not dependent on the error exponents of the 
lattice decoder. Also, it does not depend on whether s is known by the attacker beforehand. □ 
We next link Lemma [5J and Theorem Q] with Pinsker's inequality which leads to the following 
main result of this paper: 

Theorem 6: For the Gaussian two-hop network, for a rate smaller but arbitrarily close to 0.5i? e 
given by (l45l) . and a total number of channel uses 2n = Q(N): 

1) When the relay is honest, the confidential message W can be transmitted at this rate such 
that all the three terms Pr(W ^ W), I(W; Y r n ) and 

Pr (W is not accepted by Node 2\W = w) (51) 

decrease exponentially fast with N. 

2) When the relay is not honest, the probability that the Byzantine attack goes undetected, 
i.e., the probability that the adversary wins, denoted as Pr(A wins) in ©, decreases 
exponentially fast with N. 

Proof: We use "HRH" for "hash rule holds" when for s ^ s', 

x d+2 + J2 s lX l = x' d+2 + ]T s[x H + A h (52) 

i=l i=l 
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This means the message s',x',h' will be accepted by node 2. Hence the probability that the 
adversary wins is given by: 

Pr (A wins) 

Pi(RRR\x,A h ,A x ,s = s ,s') (53) 
x a Pr (x\A h , A x , s = s , s') Pr (A h , A x , s'\s = s ) 

Define Q(A wins) as the term (|53l) with Pr (x\A h , A x , s = s , s') replaced by Pr(x). 

Q (A wins) 

PT(HRH\x,A h ,A x ,s = s ,s') (54) 
xA Pr (x)Pr (A h , A x , s'\s = s ) 

Ah,s 7= s 

Note that Q (A wins) would be the probability that the Byzantine adversary wins if x and 
Ah, A x , s, s' are truly independent. To evaluate the effect of being otherwise, we next bound the 
difference between Pr (A wins) and Q (A wins). 

| Pr (A wins) — Q (A wins) | (55) 

Pr (RRR\x,A h ,A x ,s = s ,s') 

< J2 \Pr(x\A h ,A x ,s = s ,s')-Pr(x)\ (56) 

x A 

A h ,s'^s Pr (A h , A x , s'\s = s ) 

< | Pr (x\A h , A x , s = so, s') - Pr (x) \ ^ 
x ,Ax Pr (A h , A x ,s'\s = Sq) 

A h ,s'^s 

\Pr(x\A h , A x ,s', s = so) -Pr(x\s = s ) \ 

— 2^ ( 5 °-> 

x ,a x Pr (A h , A x ,s'\s = s ) 

A h ,s'^s 

= ]T \Pr(x,A h ,A x ,s'\s = so)-Pr(x\s = so)Pr(A h ,A x ,s'\s = so)\ (59) 

x,Ax 
A h ,s'^s 

< £ \Pr(x,A h ,A x ,s'\s = so) -Pt(x\s = so)Pi(A h ,A x ,s'\s = s )\ (60) 

x,A m 
A h ,s' 



Then we use Pinsker's inequality 11371 Theorem 2.33] 



I(A;B) > -L-D 2 (p(A,B),p(A)p(B)) (61) 



where D(p(x),q(x)) = J2 X \p( x ) ~ q( x )\- 
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Let p(A) be Pr(x|s = s ). Let p(B) be Pr(A/ l , A x , s'|s = s ). Let p(A, B) be given by: 

p(A, B) = Pi(x,A h ,A x , s'\s = s ) (62) 



Then from Lemma 21 (l60l) is bounded by y (8 In 2) exp(— 0N) because of Pinsker's inequality. 
Hence we have: 



| Pr (A wins) - Q (A wins) | < y] (8 In 2) exp(-(3N) (63) 
From Theorem [H Q (A wins) is bounded by ^p-. Hence 

d+1 



Pi(A wins) < y(8m2)exp(-/3A0 + — ^— (64) 

Each {s} conveys drlog 2 q bits of information, where r is defined in Theorem [2l Recall that 
the total number of channel uses is denoted by 2n. The relay node transmits during n channel 
uses. Node 1 transmits during the other n channel uses. When node 1 transmits, node 2 may or 
may not transmit depending on which of the 4 stages described at the beginning of this section 
is being executed. For the four-stage transmission scheme, n is given by: 

dr log 2 q 



n = 2N + r + 



N (65) 



NR e 

This is because N channel uses are needed to transmit x or k, and r channel uses are needed 
to transmit k © h. The third term in (1651) is the number of channel uses needed to transmit s, 
where \x] is the operation that rounds x to the nearest integer greater than or equal to x. 
The overall secrecy rate R T is given by 

= dr\o g2 q 

2n V ' 

From (1651) . we observe Rt can be made arbitrarily close to 0.5i? e by choosing a sufficiently 
large d. 

Let Pt denote the transmission power averaged over the channel uses during which a node 
transmits. Based on the four stage transmission scheme, P T of node 1 and the relay are the 
same. Pt of node 2 is smaller since it does not transmit during the third stage. Hence we only 
need to make sure Pt of node 1 does not exceed the power constraint P. P T of node 1 is given 
by 

P X 2N + P 2 r + P( !tl£E2l) 

P T = V He ' (67) 

n 
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P T can be made arbitrarily close to but strictly smaller than P by choosing a sufficiently large 
d and a sufficiently small ep. 

Once Rt and Pt is fixed, d is fixed. On the other hand, as shown by (1651) and (fT8~l) . for a 
fixed d, n increases linearly with respect to N. 

Select r as in (fT8l such that r increases linearly with respect to N. Then, from (164b . we 
observe that the probability that the adversary wins decreases exponentially fast with N. Hence 
we have the bound on Pr(A wins) stated in the theorem. 

We next check whether the secrecy constraint is satisfied: 

I (s; Y r {%) , < i < 3) (68) 

<I (z; F r (0)) + / (ft; F r (1) , F r (2)) + / (s; F r (3)) (69) 

In (169b . the first term decreases exponentially fast with respect to N due to Theorem [2l For 
the second term, we have 

/ (h; Y T (1) , Y r (2)) <J (h; Y r (1) , Z r (l), h®k) (70) 

=1 (/i; (1) , h © fc) (71) 

=I(h\h@k) + l(h\Y r (l)\h@k) (72) 

=/(/i;F r (l)|/i©A;) (73) 

<l(h,k;Y r (l)) =l(k;Y r (l)) (74) 

Hence, the second term is bounded by I(k,Y r (l), which also decreases exponentially fast with 
respect to N due to Theorem [2l The third term decreases exponentially fast with respect to 
dr l ° g2 9 due to Theorem [5J Hence (|68l ) decreases exponentially fast with respect to N. 

JrCe 

Finally, we check whether the confidential message W, which corresponds to s in our scheme, 
can be transmitted reliably. We observe that the probability Pi(W ^ W) does decrease expo- 
nentially fast with respect to iV because the decoding error probability of the lattice decoder 
decreases at this speed, as stated in the end of Section [IV] 

The probability 

Pr (W is not accepted by Node 2\W = W) (75) 

depends on whether x, k, k © h can be transmitted reliably. Since they are also transmitted with 
the nested lattice code and decoded with a lattice decoder, the probability of decoding error 
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when transmitting x,k,k © h also decreases exponentially with respect to the dimension of the 
lattice, which in turn increases linearly with N. Hence (1751) also decreases exponentially fast 
with respect to N. 

Hence we have proved the theorem. 

■ 

Remark 9: It is evident from (|63l) that if Lemma [5] were weakened to just proving the left- 
hand side converges to 0, which is the case if the conventional strong secrecy notion like the one 
in E71 is used, then it would not be possible to preserve the exponentially decreasing detection 
property offered by the AMD code. Hence in this problem, the commonly recognized strong 
secrecy notion is insufficient, and a stronger notion, as described by (fT9l) , is required. 

VII. Conclusion 

In this work, we developed a coding scheme which provides strong secrecy by combining 
nested lattice codes and universal hash functions. In our previous work [|23l , the representation 
theorem for nested lattice codes is used to bound the Shannon entropy. Here we showed the same 
theorem is also useful in bounding another information theoretic measure, i.e., the Renyi entropy, 
which in turn leads to the desired strong secrecy results in a Gaussian setting. We showed that 
this coding scheme can be used with AMD codes to perform Byzantine detection for a Gaussian 
two-hop network where the relay is both an eavesdropper and a Byzantine attacker. Using this 
code, we showed that the probability that a Byzantine adversary wins decreases exponentially 
fast with respect to the number of channel uses. 

It should be noted that, in this work, we have assumed that the channel gains are known by 
each node before the communication starts. It should be recognized that the Byzantine attacker 
at the relay node may attempt to manipulate the channel estimation process, for example, by 
broadcast incorrect pilot signals, to gain an advantage. Detection of this type of misbehavior is 
closely related to the physical layer implementation of the system and is left as future work. 

Appendix A 
Proof of Lemma [j] 

When A c = qA and the generation matrix of A has full rank, there are q N lattice points in 
(A + d N ) fl V (A c ). Each point in (A + d N ) fl V (A c ) can be represented by its coordinates, which 
is a vector composed of N integers: {ci, cat}. 
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We next prove the following mapping is an isomorphism from (A + d N ) fl V (A c ) to the group 
of a finite field gjF{q N ): 

I : I(ci, ...Cm) = {ci mod q + (c 2 mod + (c^ mod q)x N ' 1 } (76) 

First we prove that two elements in (A + d N ) fl V (A c ) can not be mapped to the same element 
in QJ r (q N ). This can be proved via contradiction: Suppose they can. Then, we have two points 
x, and y, whose coordinates are {ai, ...,ajv} and {bi, ...,6jv} respectively, such that 

Oj — 6j mod q = i = 1, iV (77) 

3j, (78) 

This means x — y E qA = A c . Let 2 G A c be x — y. Then x = y + z and z ^ 0. 
Define the quantization operator Qa c (x) as 

Qa c (^) = a r g min ||t — x|| (79) 

where ||t— x|| denotes the Euclidean distance between t and x. Qa c (x) has the following property: 

\/z G A c , Qa c (x + z) = Qa c (x) + 2. This can be shown as follows: 

Qa c ( x + z) = argmin \\t — x — z\\ (80) 
= arg min lift — z) — x|| (81) 

t— zeA c 

= argmin lit' — x\\ + z (82) 

t'GA c 

= Qa c (x) + z (83) 

Since x,y G V(A C ). This means Qa c (x) = and Qa c (v) = . However we can also write 
Qk c { x ) = Qa c (v + z ) — Qac(v) + z = z 0. This leads to a contradiction. 

Since I cannot map two different lattice points to the same field element, and the set (A + 
d N ) R V (A c ) has the same cardinality as QJ r (q N ), I must be a one-to-one mapping. 

Finally, it is easy to verify that I preserves the addition operation: 

I(x + y) = I(x)+l(y) (84) 
This completes the proof that I is an isomorphism. 
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Appendix B 
Proof of Theorem [5] 

For the distribution for tf,i = 1,2 stated in Theorem [51 tf © is independent from t±. 
Therefore: 

H 2 (tf |tf © # = t") = fla (tf ) = N (85) 
Then, as in d26]), with probability 1 - 2 _(s/2 ~ 1) : 

fl 2 (tf|tj v e^ = ^,T = a) (86) 
># 2 (tf |tf © tf = t") - log 2 |T| - s = N - N - s (87) 

We next use the fact that when G is uniformly distributed over the set of linear functions 

from QJ r (2) No to (?J r (2) r °, the following equation holds according to Theorem [4] 

on)— c 

H (G (v(t?)) |G, i»et» = t»,T = a)>r -— (88) 

where c = N — N — s. 
Hence 

r ° " IuTJ (89) 

In order for 2~^/ 2-1 ) to decrease exponentially fast with respect to N, we choose s = eN, 
where < e < Ro — 1 so that c is positive. Choose r such that for 5 > 0: 

r < c - 7V5/2 = N Q -N-8- N5/2 (90) 

so that 2 r °~ c decreases exponentially fast with respect to N. Recall by (1381 ), we have N > 
NRq — 1. Hence a sufficient condition for ( |90| ) to hold is to require 

r < JV(i2o -l)-s-N5 (91) 

This yields (l40l) . For this r and s, from (f89l , we observe that there exists /3 > 0, such that 

/(G(u(tf));tf ©^,T|G) <e"^ (92) 

We next use the fact that for sufficiently large JV, most G has full row rank as shown in Lemma 
[2l Therefore, under a uniform distribution for tf,i = 1,2, t± and being independent, there 
must exists a G = g, such that 
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1) g has full rank. 

2) / (G (v(t? )) ; tf © t», T\G = g)< 2e-? N 
Hence we have proved Theorem [5] 

Appendix C 
Proof of Lemma [5] 

The following notation is used in the proof: Xi(j),i = 0, 3, X r (j) denote the signals 
transmitted by node 1,2 and the relay during the jth stage, j = 0, ...,3. Similarly, Yi(j),i = 
1,2, Y r (j), Z r (j), Z R (j) denote the signals and channel noise observed during the jth stage. 
X r (i), i = 0, 3 denotes the estimate for X r (i) computed by node 2. To simplify the notation, 
we omit the superscript for these signals which were used to indicate their dimensions. 

As described in Section |VH the 0th stage is used to transmit x. The 1st stage is used to 
transmit k. The 2nd stage is used to transmit k © h. The 3rd stage is used to transmit s. 

We next explain how to upper bound the following quantity: 

I(x;A x ,A h ,s\s = s ) (93) 

Let © in x © y denote the addition operation in the field where x and y are taken from. Let — x 
denote the element such that (—x) © x = 0. Recall that g is the linear mapping whose existence 
is proved in Theorem [2] With these notations, we can write A x as: 

A x =g (X r (0) © (-X 2 (0))) © (-x) (94) 

=g (X r (0) © (-X 2 (0))) © g {-X x (0)) (95) 

=g (X r (0) © (- (X 2 (0) © Xx (0)))) (96) 
Since A x is a function of X r (0) and X 2 (0) © X^O), (|93T ) is upper bounded by: 

I (x; X r (0) , X 1 (0) © X 2 (0) , A h , s\s = s Q ) (97) 
X r (0) is computed from Y 2 (0) by node 2. Hence (|97|) is upper bounded by: 

/ (x; Y 2 (0) , X, (0) © X 2 (0) ,A h ,s\s = s ) (98) 
<I (x; X r (0) , Z R (0) , X x (0) © X 2 (0) , A h , s\s = s ) (99) 
=/ (x; X r (0) , (0) © X 2 (0) , A h , s\s = s ) 
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+ / (x; Z R (0) |X r (0) , X x (0) © X 2 (0) , A h , s, s = s ) (100) 

Recall that Z R (0) is the noise observed by Node 2 during the stage responsible for transmitting 
x. We observe that it is independent from all the other terms in the second term of (11001) . This 
is because A^, s, s are only related to signals transmitted in later stages. The relay node has no 
knowledge of Z R (0). Hence Z R (0) can not affect the relaying strategy. As a result, (1 1001) equals 

/ (x; X r (0) , Xi (0) © X 2 (0) , A h , s|s = s ) (101) 

Recall that M r denotes the randomness available to the relay node. Then, the expression in (1 1 1 1> 
is upper bounded by 

I (x; M r , X r (0) , F r (0) , X! (0) © X 2 (0) ,A h ,s\s = s ) (102) 
=/ (x; M r , F r (0) , X x (0) © X 2 (0) , A h , s\s = s ) + 
J (x; X r (0) |M n F r (0) , X x (0) © X 2 (0) , A h , s, s = s ) (103) 

Since X r (0) is computed from Y r (0) at the relay node, it is a deterministic function of Y r (0), 
M r and potentially s . Hence the second term in (11031) is 0, and (11031) equals: 

/ (x; M r , Y r (0) , X x (0) © X 2 (0) , A h , s\s = s ) (104) 

We next examine A h in (11041) . Recall that u is defined as k © h. u and k are the estimates for 
u and computed by node 2 respectively. With these notations, we can express A h as: 

A /t = u © (-£) © (-/i) (105) 
= u © ((-A;) © (-A fe )) © (-/i) (106) 
= u © (-(fc © h)) © (-A fc ) (107) 

As seen from (I105I) - (I107I) . A/, is a function of m, k®h, and A fc . Therefore (11041) can be upper 
bounded by: 

I (x; M r , Y r (0) , X x (0) © X 2 (0) ,u,k@h, A k , s\s = s ) (108) 

Note that n is computed from F 2 (2) by node 2. Therefore (11081) is upper bounded by: 

/ (x; M r , Y r (0) , X x (0) © X 2 (0) ,Y 2 (2),k@ h, A k , s\s = s ) (109) 
</(x; M r , Y r (0) , X x (0) © X 2 (0) , X r (2) ,Z R (2),k® h, A k , s\s = s Q ) (1 10) 
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=I{x\ M r , Y r (0) , X x (0) © X 2 (0) ,X r (2),k® h, A k , s\s = s ) 
+ I{x; Z R (2) |M n y r (0) , X, (0) © X 2 (0) ,X r {2),k® h, A k , s, s = s ) (1 11) 

Again Zr(2) is independent from all the other terms in the second term of (II 1 II) . Hence (II 111) 
equals: 

I(x; M r , Y r (0) , X x (0) © X 2 (0) , X r (2) , k © /i, A fc , s\s = s ) (112) 
For Afc, we have: 

A fc = g (X r (1) © (-X 2 (1))) © (-k) (113) 
= g (X r (1) © (-X 2 (1))) © g (-Xx (1)) (114) 
= g (X r (1) © (- (X 2 (1) © X x (1)))) (115) 
Hence A^ is a function of X r (l), X 2 (l) © Xx(l). Therefore (II 121) can be upper bounded by: 

I{x; M r , Y r (0) , Xx (0) © X 2 (0) , Y r (2) , fc © h, X r (1) , ^ (1) © X 2 (1) , s\s = s ) (116) 
X r (l) is computed from Y 2 (l) by node 2. Hence (II 161) is upper bounded by: 

J(x; M n y r (0) , Xi (0) © X 2 (0) , F r (2) ,k®h, Y 2 (1) , X x (1) © X 2 (1) , s\s = s ) (1 17) 
<I(x; M r , Y r (0) , Xx (0) © X 2 (0) ,Y r (2),k® h, X r (1) , Z R (1), 

Xx(l)(BX 2 (l),s\s = s ) (118) 
=/(x; M r , Y r (0) , X x (0) © X 2 (0) , Y r (2) , k © /i, X r (1) , X x (1) © X 2 (1) , s|s = s ) 
+ J(x; Z fl (l)|M n F r (0) , Xi (0) © X 2 (0) , K r (2) , 

k®h, X r (1) , Xx (1) © X 2 (1) ,8,8= S ) (119) 

=J(x; M r , Y r (0) , Xi (0) © X 2 (0) , Y r (2) , k © /i, F r (1) , X x (1) © X 2 (1) , s\s = s ) (120) 
Finally, s is computed from F 2 (3),X 2 (3) by node 2. Hence (11201) is upper bounded by: 

I{x; M r , Y r (0) , Xx (0) © X 2 (0) , Y r (2) , k © fc, K r (1) , 

X x (1) © X 2 (1) , Y 2 (3) , X 2 (3) \s = s ) (121) 
</(x; M r , Y r (0) , Xx (0) © X 2 (0) , Y r (2) , k © /i, K r (1) , 

Xi (1) © X 2 (1) , X r (3) , X 2 (3) \s = s Q ) (122) 
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Since X r (3) is a deterministic function of M r , Y r (3) and potentially s , we can upper bound 
(11221) with the following term by replacing X r (3) with Y r (3): 

I(x;M r , Y r (0) , X x (0) © X 2 (0) , Y r (2) , k © /i, F r (1) , X x (1) © X 2 (1) , 

F r (3),X 2 (3)|s = s ) (123) 
< I(x;M r , Y r (0) , X a (0) © X 2 (0) , Y r (2) , A; © h, Y r (1) , Xi (1) © X 2 (1) , 

X 1 (3),Z r (3),X 2 (3)|5 = s ) (124) 

Equation (fT24l) follows from F r (3) = Xi(3) + X 2 (3) + Z r {3). We then use the fact that the 
stochastic encoder used by node 1 to transmit s is independent from the stochastic mapping 
used at other stages. Hence, we have: 

I(x;Xx (3) , X 2 (3) , Z r (3) \M r , Y r (0) , X l (0) © X 2 (0) , (125) 
Y r (2) , fc © fc, F r (1) , X l (1) © X 2 (1) , s = s ) = (126) 

and (11241) equals: 

I(x; M r , Y r (0) , X x (0) © X 2 (0) , Y r (2) , A; © /*, F r (1) , X x (1) © X 2 (1) |s = s ) (127) 
=I(x; M r \s = s ) 

+ J(x; Y r (0) , Xx (0) © X 2 (0) , K r (2) , k © /i, K r (1) , X x (1) © X 2 (1) |M r , s = s ) (128) 

Next we note that since J(x; M r |s = s ) = 0, (11281) equals: 

I(x; Y r (0) , X x (0) © X 2 (0) ,Y r (2),k® h, Y r (1) , Xx (1) © X 2 (1) \M r , s = s ) (129) 

Equation (11291 ) is upper bounded by: 

I(x, h; Y r (0) , Xx (0) © X 2 (0) , Y r (2) , k © /i, F r (1) , X x (1) © X 2 (1) |M r , s = s ) (130) 

Recall that the notation Y r , as introduced in ([Tot , denotes the quantity obtained by subtracting 
the channel noise N r from Y r . Following this notation, we can upper bound (11301) as: 

I(x,h;Y r (0),Z r (0),Xx (0)©X 2 (0), 

Y r (2) , Z r (2) ,k®h, Y r (1) , Z r (1) , Xx (1) © X 2 (1) \M r , s = s ) (131) 
= /(x,/i;F r (0),X 1 (0)©X 2 (0), 
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Y r (2) , k © h, Y r (1) , X x (1) © X 2 (1) \M r , Z r (i),i= 1, 2, 3, s = s ) (132) 

which is further upper bounded by: 

H (Y r (0) , X x (0) © X 2 (0) \M r , Z r (z) , z = 1, 2, 3, s = s ) 
+ # (F r (2) , fc © /i, % (1) , X x (1) © X 2 (1) |M r , Z r (z) , i = 1, 2, 3, s = s ) 

- H(Y r (0) , X a (0) © X 2 (0) , Y r (2) , fc © fa, F r (1) , X x (1) © X 2 (1) | 

x, h, M r , Z r (i) ,i — 1, 2, 3, s — s ) (133) 
=fT (Y r (0) , X x (0) © X 2 (0) \M r , Z r (i) , % = 1, 2, 3, s = s ) 

+ H (? r (2) , k © /i, F r (1) , X x (1) © X 2 (1) \M r , Z r (i),i= 1, 2, 3, s = s ) 

- #(F r (0) , Xx (0) © X 2 (0) \x, h 7 M r , Z r (z) , z = 1, 2, 3, s = s ) 

- i/(K r (2) , k © /i, K r (1) , Xx (1) © X 2 (1) |F r (0) , Xx (0) © X 2 (0) , 

x, h, M r , Z r (z) , z = 1, 2, 3, s = s ) (134) 
We then use the two Markov chains shown below: 

[Y r (0) , Xx (0) © X 2 (0)} - {x, M r , Z r (z) , z = 1, 2, 3, s} - h (135) 
[Y r (2) , k © /i, F r (1) , Xx (1) © X 2 (1)} - {h, M r , Z r (i),i= 1, 2, 3, s} 

-{x,F r (0),Xx(0)©X 2 (0)} (136) 

The Markov relation in (11351) holds because given x, the distribution of {Y r (0), Xx (0) ©X 2 (0)} 
only depends on the randomness in the transmitter of node 1 and 2 during stage 0. The Markov 
chain in (11351) follows because: 

k®h-{h, M r , Z r (z) , i = 1, 2, 3, s} - {x, Y r (0), Xi(0) © X 2 (0)} (137) 

and 

{F r (2) , Y r (1) , Xx (1) © X 2 (1)} - {k © h, h, M r , Z r (z) , % = 1, 2, 3, s} 

-{x,? r (0),Xx(0)©X 2 (0)} (138) 

are Markov chains. Equation (11371) is a Markov chain, because, given h, the distribution of k@h 
only depends on k, which is independent from all the remaining terms in (11371) . Equation (11381) 
is a Markov chain, because, given k © h and fa, which implies k is given, the distribution of 
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{Y r (2), Y r (l),Xi (1) © X 2 (1)} only depends on the randomness in the transmitter of node 1 
and 2 during stage 1 and stage 2. 

Applying the two Markov chains (11351) and (11361) to the last two terms in (11341 ), we find that 
it equals: 

I (x; Y r (0) , X l (0) © X 2 (0) \M r , Z r (i),i = 1, 2, 3, s = s„) 

+ / (/i; K r (2) , fc © h, Y r (1) , Xi (1) © X 2 (1) |M r , Z r (z) , z = 1, 2, 3, s = s ) (139) 

For the first term in (1139b , since Xi(0) © X 2 (0) = Y r (0) mod A c and hence is a function of 
Y r (Q), we have 

/ (x; F r (0) , Xi (0) © X 2 (0)|M rj Z r (i), i = 1, 2, 3, s = s ) (140) 
=7 (x; F r (0)|M r , Z r (i), i = 1, 2, 3, s = s ) = ^ (a?; (0)) (141) 

Since x is extracted from a lattice point in QJ z {q N ) based on the strong secrecy scheme described 
in Section IV-ATl from Theorem H we have 7 (x; Y r (0)) < 2exp(-/3N). 

For the second term in (1139k note that Y r (2) is just Xi(2), because node 2 remains silent at 
this stage. Therefore, this term can be expressed as: 

7 (h; X 1 (2), k®h,Y r (1) , X 1 (1) © X 2 (l)|M r , Z T {i), i = 1, 2, 3, s = s ) (142) 
=7 (/i; F r (1) , Xi(2)|M r , Z r (i), i = 1, 2, 3, s = s ) 

+ I (h;k(Bh, X 1 (1) © X 2 (1) |F r (1) , X x (2), M r , Z r (i), i = 1, 2, 3, s = s ) (143) 

The second term in (11431) is since /cffi/i is a deterministic function of Xi(2) and Xi(l) © X 2 (l) 
is a deterministic function of Y r (l). Therefore (11431) equals 

7 {h- Y r (1) , A 1 (2)|M r , Z r (z), z = 1, 2, 3, s = s ) (144) 
=7(/ i ;K r (l),X 1 (2)) (145) 

Since Xi(2) is determined by /i © fc, (11451) is upper bounded by: 

l(h;Y r (l),h®k) (146) 
=I{h-h® k) + l(h]Y r {\) \h® k) (147) 
=l(h;Y r (l) \h@ k) (148) 
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<l(h,k;Y r (l)) =l(k;Y r (l)) (149) 

Since k is extracted from a lattice point in QJ z {q N ) based on the strong secrecy scheme 
described in Section IV-All hence from Theorem [2l (11491) is bounded by 2exp(— (3N). 
Therefore (11391) is bounded by 4exp(— (3N). Hence we have Lemma [51 
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